2025-04-24
In this lrm() fit, what do R2(20,1000) and R2(20,567) mean?
What can you do when you’re fitting a model to predict a multi-categorical outcome with highly unbalanced categories?
Options include
What can you do if you don’t believe the proportional hazards assumption in a time-to-event analysis?
Gelman A (2008) Scaling regression inputs by dividing by two standard deviations (PDF)
Interpretation of regression coefficients is sensitive to the scale of the inputs. One method often used to place input variables on a common scale is to divide each numeric variable by its standard deviation. Here we propose dividing each numeric variable by two times its standard deviation…. The resulting coefficients are then directly comparable for untransformed binary predictors.
What do you do to transform data that aren’t tamed well by a power transformation?
How do I incorporate something other than a weakly informative prior in stan_glm() when I fit a Bayesian regression?
How do I ingest an ASCII file into R, like [these .dat files associated with the YRBS]https://www.cdc.gov/yrbs/data/index.html)?
read_delim(), which works like read_csv() and which is part of the readr package in the core tidyverse.Dr. Love, thanks for the information about the tidyverse, but I need to code something in base R now for some horrible reason. Is there a place I can go to get an idea of what’s happening there?
Some things are always true…
Some things are always true…
but the impact of lots of other things changes depending on why you’re fitting a regression model.
Decide what your research question is, and use it to help you think about what’s important in your modeling.
Decide what your research question is, and use it to help you think about what’s important in your modeling.
What are some of the reasons you might fit a linear (or generalized linear) model?
What are some of the reasons you might fit a linear (or generalized linear) model?
Jonathan Falk writes:
So I just started reading The Mind Club, which came to me highly recommended. I’m only in chapter 2. But look at this graph…
So I just started reading The Mind Club, which came to me highly recommended. I’m only in chapter 2. But look at the above graph, which is used thusly…
As figure 5 reveals, there was a slight tendency for people to see more mind (rated consciousness and capacity for intention) in faster animals (shown by the solid sloped line)—it is better to be the hare than the tortoise. The more striking pattern in the graph is an inverted U shape (shown by the dotted curve), whereby both very slow and very fast animals are seen to have little mind, and human-speeded animals like dogs and cats are seen to have the most mind. This makes evolutionary sense, as potential predators and prey are all creatures moving at roughly our speed, and so it pays to understand their intentions and feelings. In the modern world we seldom have to worry about catching deer and evading wolves, but timescale anthropomorphism stays with us; in the dance of perceiving other minds, it pays to move at the same speed as everyone else.
That “inverted U shape” seems a bit housefly-dependent, wouldn’t you say? And how is the “slight tendency” less “striking” than this putative inverse U shape?
Yeah, that quadratic curve is nuts. As is the entire theory.
Also, what’s the scale of the x-axis on that graph? If a sloth’s speed is 35, the wolf should be more than 70, no? This seems like the psychology equivalent of that political science study that said that North Carolina was less democratic than North Korea.
Falk sent me the link to the article, and it seems that the speed numbers are survey responses for “perceived speed of movement.” GIGO all around!
https://www.johndcook.com/blog/2023/04/23/confidence-interval/
Suppose I want to know what percentage of artists are left handed and I survey 400 artists. I find that 127 of artists surveyed were southpaws. A 95% confidence interval, using the most common approach results in a confidence interval of (0.272, 0.363).
This comes from:
\[ \hat{p} \pm Z_{\alpha/2} \sqrt{\frac{\hat{p}(1 - \hat{p})}{n}} \]
Suppose we redo our analysis using a Bayesian approach. Say we start with a uniform prior on \(\theta\).
Looking at the density function, we can then say in clear conscience that there is a 94% posterior probability that \(\theta\) is in the interval (0.272, 0.363).
There are a couple predictable objections at this point. First, we didn’t get exactly 95%. No, we didn’t. But we got very close.
Second, the posterior probability depends on the prior probability. However, it doesn’t depend much on the prior.
Suppose you said “I’m pretty sure most people are right handed, maybe 9 out of 10, so I’m going to start with a beta(1, 9) prior.” If so, you would compute the probability of \(\theta\) being in the interval (0.272, 0.373) to be 0.948.
Often frequentist and Bayesian analyses reach approximately the same conclusions. A Bayesian can view frequentist techniques as convenient ways to produce approximately correct Bayesian results. And a frequentist can justify using a Bayesian procedure because the procedure has good frequentist properties.
Preprint by Miller, Tuia and Prasad (2023-04-17)
Interpretation of Wide Confidence Intervals in Meta-Analytic Estimates: Is the ‘Absence of Evidence’ ‘Evidence of Absence’?
An updated Cochrane review on physical interventions to slow the spread of respiratory viruses has sparked debate among researchers and in the media over the interpretation of the results, leading Cochrane’s editor-in-chief to issue a statement attempting to clarify comments made by the lead author.
Among other topics, the review examined the effect of medical or surgical masks on the spread of respiratory viruses in the community and found a relative risk of 1.01 95% CI (0.72 - 1.42) after pooling 6 trials. The authors of the Cochrane review concluded, “Wearing masks in the community probably makes little or no difference to the outcome of laboratory‐confirmed influenza/SARS‐CoV‐2 compared to not wearing masks”, and the first author and senior reviewer Tom Jefferson was quoted by a news outlet saying, “There is just no evidence that they make any difference. Full stop.”
In response, Cochrane editor-in-chief Karla Soares-Weiser, issued an unprecedented clarification, stating,
“Many commentators have claimed that a recently-updated Cochrane Review shows that ‘masks don’t work’, which is an inaccurate and misleading interpretation. It would be accurate to say that the review examined whether interventions to promote mask wearing help to slow the spread of respiratory viruses, and that the results were inconclusive.”
The editor went on to specifically criticize Jefferson, Soares-Weiser also said, though, that one of the lead authors of the review even more seriously misinterpreted its finding on masks by saying in an interview that it proved “there is just no evidence that they make any difference.” In fact, Soares-Weiser said, “that statement is not an accurate representation of what the review found.”
We found that … the conclusions made by Jefferson and colleagues were not only appropriate, but in line with the standardized approach created by Cochrane. Further, Jefferson’s comment in the media about there being “no evidence that they make any difference” is consistent with their conclusion in the Cochrane review in which they stated, “Wearing masks in the community probably makes little or no difference to the outcome of laboratory‐confirmed influenza/SARS‐CoV‐2 compared to not wearing masks.”
We found no obvious difference between Jefferson’s review and other recent reviews that would justify the differential interpretation and treatment of this study or the unprecedented comments made over its findings. Clarifying comments of the editor-in-chief of Cochrane appear unjustified.
http://www.rossmanchance.com/artist/proceedings/cobb.pdf
(My remarks) address three concerns: fairness, grade inflation, and a third concern that for now I’ll simply label “Roger.” Each of the three concerns is linked to a corresponding attitude toward assessment.
I was trying to systematically pay attention to you.
I was trying to emphasize things you’ve done well and things you can fix.
(from Gary Larson)
I was trying to systematically pay attention to you.
I was trying to emphasize things you’ve done well and things you can fix.
I was trying to make it safe to screw up.
And I was abandoning fairness in favor of assessing your work in your context.
For all students, both the more prepared and the less prepared, both the quicker learners and the slower learners, misdirected notions of fairness encourage a sense of competition, discourage helping others, and encourage students to judge themselves and their accomplishments by comparing themselves with others, rather than judging themselves by what works best for them as individuals. … Two different students taking the same course will inevitably get different things from it. We should embrace that inevitable difference, and try to see that each student gets as much as possible from our course, regardless of starting place.
Instead of telling me what the value of a coefficient means in generic terms:
A one-unit increase in X is associated with an increase in Y of blah blah blah.
Talk about the impact of your predictors on your outcome using the actual context of the problem you’re studying. Be specific, not generic, to be more effective.
Show us graphs and tables that help us better understand how much we should “know” after your work about the relationships you observe in the data, and describe those things in terms of the actual problem under study.
… a science, not a branch of mathematics, but uses mathematical models as essential tools.
Statistics is an important tool in the data analysis/science toolbox. Statistics provides a coherent framework for thinking about random variation, and tools to partition data into signal and noise.
… more than just p values and how you get to them.
In fact, forget about null hypothesis significance testing entirely and concentrate instead on:
even in the rare and unfortunate case where an important and binary decision “must” be made.
… too important to be left to statisticians.
432 Class 27 | 2025-04-24 | https://thomaselove.github.io/432-2025/